Detecting depression severity using weighted random forest and oxidative stress biomarkers

This study employs machine learning to detect the severity of major depressive disorder (MDD) through binary and multiclass classifications. We compared models that used only biomarkers of oxidative stress with those that incorporate sociodemographic and health-related factors. Data collected from 830 participants, based on the Patient Health Questionnaire (PHQ-9) score, inform our analysis. In binary classification, the Random Forest (RF) classifier achieved the highest Area Under the Curve (AUC) of 0.84 when all features were included. In multiclass classification, the AUC improved from 0.84 with only oxidative stress biomarkers to 0.88 when all characteristics were included. To address data imbalance, weighted classifiers, and Synthetic Minority Over-sampling Technique (SMOTE) approaches were applied. Weighted random forest (WRF) improved multiclass classification, achieving an AUC of 0.91. Statistical tests, including the Friedman test and the Conover post-hoc test, confirmed significant differences between model performances, with WRF using all features outperforming others. Feature importance analysis shows that oxidative stress biomarkers, particularly GSH, are top ranked among all features. Clinicians can leverage the results of this study to improve their decision-making processes by incorporating oxidative stress biomarkers in addition to the standard criteria for depression diagnosis.


Figure 1 :
Figure 1: Confusion matrix of LR with oxidative stress biomarkers as the main features.

Figure 2 :
Figure 2: Confusion matrix of LR with all features.

Figure 3 :
Figure 3: Confusion matrix of RF with all features.

Figure 4 :
Figure 4: Confusion matrix of RF with oxidative stress biomarkers as the main features.

Figure 5 :
Figure 5: Confusion matrix of KNN with all features.

Figure 6 :
Figure 6: Confusion matrix of KNN with oxidative stress biomarkers as the main features.

Figure 7 :
Figure 7: Confusion matrix of SVM with all features.

Figure 8 :
Figure 8: Confusion matrix of SVM with oxidative stress biomarkers as the main features.

Figure 9 :
Figure 9: Confusion matrix of NB with all features.

Figure 10 :
Figure 10: Confusion matrix of NB with oxidative stress biomarkers as the main features.

Figure 11 :
Figure 11: Confusion matrix of LR with oxidative stress biomarkers as the main features.

Figure 12 :
Figure 12: Confusion matrix of LR with all features.

Figure 13 :
Figure 13: Confusion matrix of RF with all features.

Figure 14 :
Figure 14: Confusion matrix of RF with oxidative stress biomarkers as the main features.

Figure 15 :
Figure 15: Confusion matrix of KNN with all features.

Figure 16 :
Figure 16: Confusion matrix of KNN with oxidative stress biomarkers as the main features.

Figure 17 :
Figure 17: Confusion matrix of SVM with all features.

Figure 18 :
Figure 18: Confusion matrix of SVM with oxidative stress biomarkers as the main features.

Figure 19 :
Figure 19: Confusion matrix of NB with all features.

Figure 20 :
Figure 20: Confusion matrix of NB with oxidative stress biomarkers as the main features.

Figure 21 :
Figure 21: Confusion matrix of weighted RF with all features.

Figure 22 :
Figure 22: Confusion matrix of weighted RF with oxidative stress biomarkers as the main features.

Figure 23 :
Figure 23: Confusion matrix of weighted LR with all features.

Figure 24 :
Figure 24: Confusion matrix of weighted LR with oxidative stress biomarkers as the main features.

Figure 25 :
Figure 25: Confusion matrix of weighted RF with all features.

Figure 26 :
Figure 26: Confusion matrix of weighted RF with oxidative stress biomarkers as the main features.

Figure 27 :
Figure 27: Confusion matrix of weighted LR with all features.

Figure 28 :
Figure 28: Confusion matrix of weighted LR with oxidative stress biomarkers as the main features.

Table 1 :
Hyperparameter value range and optimal values for all ML methods.